The number of guards needed by a museum: a phase transition in vertex covering of 

random graphs 



O 

o 
o 

(N 

>>: 



C/2 



X3 

o 

(N 
> 

m 



o 
o 
o 



o 



X 



Martin Weigt and Alexander K. Hartmann 
Institute for Theoretical Physics, University of Gottingen, Bunsenstr. 9, 37073 Gottingen, Germany 
E-mail: weigt/hartmann(Stheorie.physik. uni-goettingen. de 
(February 1, 2008) 

In this letter we study the NP-complete vertex cover problem on finite connectivity random graphs. 
When the allowed size of the cover set is decreased, a discontinuous transition in solvability and 
typical-case complexity occurs. This transition is characterized by means of exact numerical sim- 
ulations as well as by analytical replica calculations. The replica symmetric phase diagram is in 
excellent agreement with numerical findings up to average connectivity e, where replica symmetry 
becomes locally unstable. 
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Imagine you are director of an open-air museum situ- 
ated in a large park with numerous paths. You want to 
put guards on crossroads to observe every path, but in 
order to economize costs you have to use as few guards 
as possible. Let N be the number of crossroads, X < N 
the number of guards you are able to pay. Then there 
are (^) possibilities of putting the guards, but the most 
"configurations" will lead to unobserved paths. Deciding 
whether there exists any perfect solution or finding one 
can thus take a time growing exponentially with N . In 
fact, this problem is one of the six basic NP-complete 
problems [l|, namely vertex cover (VC). It is widely be- 
lieved that no algorithm can be found which solves our 
problem substantially faster than exhaustive search for 
any configuration of the paths. 

Similar combinatorial decision problems have been 
found to show interesting phase transition phenomena. 
These occur in their solvability and, even more surpris- 
ingly, in their typical-case algorithmic complexity, i.e. 
the dependence of the median solution time on the sys- 
tem size 1^. E.g. in satisfiability (SAT) problems a num- 
ber of Boolean variables has to simultaneously satisfy 
many logical clauses. When the number of these (ran- 
domly chosen) clauses exceeds a certain threshold, the 
solvability of the full problem undergoes a sharp transi- 
tion from almost always satisfiable to almost always un- 
satisfiable ||^. The instances which are hardest to solve 
are found in the vicinity of the transition point. Far 
away from this point the solution time is much smaller, 
as a formula is either easily fulfilled or hopelessly over- 
constrained. The typical solution times in the under- 
constrained phase are even found to depend only poly- 
nomially on the system size. Recently, insight coming 
from a statistical mechanics perspective on these prob- 
lems 1^ has lead to a fruitful cooperation with computer 
scientists, and has shed some light on the nature of this 
transition ||^. Frequently, the methods of statistical me- 
chanics allow to obtain more insight than the classical 



tools of computer science or discrete mathematics. 

This is also true for the above mentioned VC prob- 
lem. After having introduced the VC model and reviewed 
some previously known rigorous results, we present nu- 
merical evidence for the existence of a phase transition in 
its solvability which is connected to an exponential peak 
in the typical case complexity. Due to the much simpler 
geometrical structure, many features of this transition 
can be understood much more intuitively than for SAT. 
In addition, we will see that the replica-symmetric 
theory correctly describes the phase transition up to an 
average connectivity e. This is a fundamental difference 
to previously studied models with discontinuous tran- 
sitions; see for the example of S-satisfiability where 
replica symmetry breaking is necessary to calculate the 
transition threshold. 

Let us reformulate our problem in a mathematical way: 
Take any graph G — {V, E) with N vertices i E V = 
{l,2,...,Af} (the crossroads in the above example) and 
edges (z,j) G E C V xV (the paths). We consider 
undirected graphs, so with G E we also have (j, i) S 
E. A vertex cover is a subset Kc C F of vertices such 
that for all edges G E there is at least one of its 

endpoints i or j in Vuc (the path is observed). We call 
the vertices in Vyc covered, whereas the vertices in its 
complement V \ Vyc are called uncovered. Please note 
that the VC of a disconnected graph is consequently given 
by the union of the VCs of its connected components. 

Also partial VCs U C V are considered, where there 
are some uncovered edges {i,j) with i ^ U and j ^ U. 
The task of finding the minimum number of uncovered 
edges for given graph G and the cardinality \U\ = X 
is an optimization problem. We have already mentioned 
that the corresponding decision problem if a VC of fixed 
cardinality X does exist or not, belongs to the basic NP- 
complete problems. 

In order to be able to speak of typical or average cases, 
we have to introduce some probability distribution over 
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graphs. We investigate random graphs Gn^^/n with N 
vertices and edges (with i ^ j) which are drawn 

randomly and independently with probability c/N. Thus 
the expected edge number equals {'^)c/N = cN/2+0{l). 
The average connectivity c remains finite in the limit 
— > oo of infinitely large graphs. For c < 1, these 
graphs are known to be decomposed into 0{N) connected 
components of typical size 0(1), whereas for c > 1 a giant 
component appears which unifies 0{N) vertices For 
a more recent and complete introduction see ||^ . 

As an element of a VC Kc typically covers 0(c) edges, 
the minimal cover size Xmin is also expected to be of 
0{N), Xmin = xn,cN. In fact, there are rigorous lower 
and upper bounds on xn.c which are valid for almost all 
random graphs. To our knowledge, the best bounds are 
given in [|io|,0, see Fig. || for a comparison with our 
results. The exact asymptotics for large connectivities 
c » 1 is also known. Frieze proved that 

XN,c = 1 - -(logc - log log c - log2 + 1) + o{-) (1) 
c c 

for almost all graphs Gm,c/n, N oo. 

It is however not clear, if a sharp threshold Xc{c) — 
limjv^oo Xn.c does exist at finite c, with the over-bar de- 
noting the average over the random graph ensemble at 
fixed iV and c. In order to get some intuition on this point 
we have started our work with exact numerical simula- 
tions. Analytic results are presented below. 

Using an exact branch-and-bound algorithm all 
optimal configurations at fixed X are enumerated: As 
each vertex is either covered or uncovered, there are 
possible configurations which can be arranged as leaves 
of a binary (backtracking) tree. At each node of the tree, 
the two subtrees represent the subproblems where the 
corresponding vertex is either covered or uncovered. A 
subtree will be omitted if its leaves can be proven to con- 
tain less covered edges than the best of all previously con- 
sidered configurations. The order of the vertices within 
the levels of the tree is given by their current connectiv- 
ity, i.e. only neighbors are counted which are not yet 
included into the cover set. Thus, the first descent into 
the tree is equivalent to the greedy heuristic which iter- 
atively covers vertices by always taking the vertex with 
the highest current connectivity. 

First results are exposed in Fig. ^ The probability 
of finding a vertex cover of size xN in a random graph 
Gn.c/n is displayed for c = 2 and several values of N , 
analogous results have been obtained for other values of 
c. The drop of the probability from one for large cover 
sizes to zero for small cover sets obviously sharpens with 
N , so that a jump at a well-defined Xc{c) is to be expected 
in the large- limit: for x > Xc{c) almost all random 
graphs with cN edges are coverable with xN vertices, 
below Xc{c) almost no graphs have such a VC. Fig. |l| 
also shows the minimal fraction e of uncovered edges as 



a function of x for the partial covers. It vanishes for 
X > Xc{c), whereas it remains positive for x < Xc{c). 

It is also interesting to measure the median computa- 
tional effort, as given by the number of visited nodes 
in the backtracking tree, in dependence on x and N. 
The curves, which are given in Fig. ^, show a pro- 
nounced peak near the threshold value. Inside the cov- 
erable phase, X > Xc{c), the computational cost is grow- 
ing only linearly with N, and in many cases the greedy 
heuristic is already able to cover all edges by covering 
xN vertices. Below the threshold, x < xdc), the compu- 
tational effort is clearly exponential in N, but becomes 
smaller and smaller if we go away from the threshold. 
This easy-hard-easy scenario resembles very much the 
typical-case complexity pattern of 3SAT ^ , and deserves 
some analytical investigation. 

To achieve this, we use the strong similarity between 
combinatorial optimization problems and statistical me- 
chanics. In the first case, a cost function depending on 
many discrete variables has to be minimized, e.g. the 
number of uncovered edges is such a cost function for 
vertex cover. This is equivalent to zero temperature sta- 
tistical mechanics, where the Gibbs weight is completely 
concentrated in the ground states of the Hamiltonian. As 
the local variables for VC are binary because a vertex is 
either covered or uncovered, we may give a canonical one- 
to-one mapping of the vertex cover problem to an Ising 
model: for any subset U C V we set Si = +1 if i G U , 
and Si — —1 a i ^ U . The edges are encoded in the ad- 
jacency matrix (Jy): an entry equals 1 iff (i, j) G -E, and 
Jij = else. (Jij) is thus a symmetric random matrix 
with independently and identically distributed entries in 
its lower triangle. The Hamiltonian, or cost function, 
of the system counts the number of edges which are not 
covered by the elements of U, 

H = Y,J^lSs.,-lSs,.-l, (2) 

i<j 

and has to be minimized under the constraint \U\ = xN, 
which, in terms of Ising spins, reads 

lj2s..^2x-l. (3) 

i 

The resulting ground state energy e^s (c, x) equals zero iff 
the graph is coverable with xN vertices. 

We want to skip the details of the calculation, as these 
go beyond the scope of this letter. A detailed techni- 
cal description will be presented elsewhere . We only 
mention the main steps: 

(i) We introduce a positive formal temperature T 
and calculate the canonical partition function Z ~ 
'Ylic 6^P{~^/^} where the sum is over all configura- 
tions {S'iji^i N which satisfy (^. 

(ii) We are interested in the disorder-averaged free- 
energy density /(c, a;) = — limTv^oo TA^~^ln Z, which we 
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calculate using the replica method, closely following the 
scheme proposed in p^ . Within the replica symmet- 
ric framework, this free energy self-consistently depends 
on the order parameter P{m) which is the histogram of 
local magnetizations mi = {Si). (•) denotes the thermo- 
dynamic average. 

(Hi) The ground states are recovered by sending T — > 0. 
In this limit, one has to take care of the scaling of the or- 
der parameter with T, which is different below and above 
Xc(c). For a similar reasoning in the case of 3SAT see also 

i- 

(iv) Both equations for x < Xc{c) and x > Xc{c) tend 
to the same limit for x — » Xc{c). At the threshold, the 
resulting self-consistency equation can be solved analyt- 
ically. 

From this solution, many properties of the threshold 
VCs can be read off. The first is of course the value of 
the threshold itself: 



b+ic) = 1 



w{c) + w{cy 



(5) 



Xc{c) = 1 - 



2W{c) + W{cf 
2?. 



(4) 



with the Lambert- W- function W [0. The result for 
Xc{c) is displayed in Fig. ^ along with numerical data ob- 
tained by a variant of the branch-and-bound algorithm. 
For relatively small connectivities c perfect agreement is 
found. We also have compared with rigorous bounds 
obtained from counting VCs for small connected compo- 
nents having up to 7 vertices, which are very precise for 
small c (e.g. 0.999997N vertices are taken into account 
for c=0.1). Also here, perfect coincidence was found. 

For larger c systematic deviations of (||) from numeri- 
cal results occur, it even violates the asymptotic form (|^). 
For c > e, the replica symmetric solution becomes insta- 
ble, and we find a continuous appearance of a replica 
symmetry broken solution; work is in progress on this 
point 161. We conjecture, that the replica symmetric re- 
sult (p) is exact for c < e, whereas it gives a lower bound 
for c > e [|l8|. Please note that this point is situated 
well beyond c — 1 where the giant component appears. 
Neither analytically nor numerically, we have found any 
influence of the giant component on the vertex covers. 
This is significantly different from Ising models on ran- 
dom graphs as studied in |p^ . 

Besides the value of Xc{c), the replica symmetric solu- 
tion also contains structural information. One important 
phenomenon is a partial freezing of degrees of freedom. 
For a given random graph, there exists typically an expo- 
nential number of minimal VCs, thus the entropy density 
is finite. On the other hand, a fraction &+(c) of the ver- 
tices will be covered in all minimal VCs, thus forming 
a covered backbone, other vertices will never be covered 
and are collected in the uncovered backbone which has 
size b-{c)N: 



In Fig. g the total backbone size bc{c) = b-{c) + b+{c) 
is compared with numerical data, again very good agree- 
ment is found in the range of validity of replica symmetry. 

For small c, the uncovered backbone is large, which 
is mainly due to isolated vertices which have to be un- 
covered in minimal VCs. The simplest structure show- 
ing a covered backbone are subgraphs with three vertices 
and two edges. In the minimal VC of this subgraph, the 
central vertex is covered, thus belonging to the covered 
backbone, the other two are uncovered, thus belonging 
to the uncovered backbone. The simplest non-backbone 
structures are components with only two vertices and one 
edge, because the vertices have no unique covering state. 

These backbones appear discontinuously at the thresh- 
old because inside the coverable phase the backbone is 
empty. The proof is simple {x > Xc{c) fixed): 

(i) Assume that there is a non-empty uncovered back- 
bone, with i being an element. Now take any minimal 
cover Vq. It can be extended by covering arbitrarily cho- 
sen (x — Xc{c))N vertices out of 1^ \ Vq, e.g. vertex j, 
which is a contradiction to our assumption. 

(ii) Assume now a non-empty covered backbone, with i 
being an element. Then i has to be an element of Vq. 
As the connectivity of i is almost surely smaller than or 
equal to O(logiV), all uncovered neighbors of i can be 
covered by some of the (x — Xc{c))N covering marks (for 
N sufficiently large), and i can be uncovered without un- 
covering the graph. This is again a contradiction to our 
assumption. 

To summarize, we have investigated the vertex cover 
problem on random graphs by means of exact numerical 
simulations and analytical replica calculations. A sharp 
transition from a coverable to an uncoverable phase is 
found by decreasing the permitted size of the cover set. 
This transition coincides with a change of the typical 
case complexity from linear to exponential growth in N 
and the discontinuous appearance of a frozcn-in back- 
bone. The complete RS solution was given for c < e, it is 
found to be in perfect agreement with numerical results. 
For c > e the behavior is less clear as replica symmetry 
breaking occurs. 

Also the behavior inside the coverable and the uncov- 
erable phases is of some interest. There the use of vari- 
ational techniques similar to those proposed in could 
be of great help. 

The authors are grateful to J. A. Berg for critically 
reading the manuscript. Financial support was provided 
by the DFG {Deutsche Forschungsgemeinschaft) under 
grant Zi209/6-l. 
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FIG. 1. Probability Pcov{x) that a cover exists for a ran- 
dom graph (c = 2) as a function of the fraction x of covered 
vertices. The result is shown for three different system sizes 
— 25,50, 100 (averaged over 10"* - 10^ samples). Lines are 
guides to the eyes only. In the left part, where Pcov = 0, the 
energy e (see text) is positive. The inset enlarges this result 
the region 0.3 < a; < 0.5. 
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FIG. 2. Time complexity of vertex cover: Median num- 
ber of nodes visited in the backtracking tree as a func- 
tion of the fraction x of covered vertices for graph sizes 

= 20, 25, 30, 35, 40 (c = 2). The inset shows the region be- 
low the threshold with logarithmic scale, including also data 
for N — 45, 50. The fact that in this representation the lines 
are equidistant shows that the time complexity grows expo- 
nentially with N. 




c 



4 



FIG. 3. Phase diagram: critical fraction of covered ver- 
tices as a function of the edge density c. For x > Xc{c), almost 
all graphs have VCs with xN vertices, while they have almost 
surely no VC with x < Xc{c). The solid line shows our ana- 
lytic result. The rigorous bounds are given by dot-dashed ^] 
resp. dashed jll] lines. The vertical line is at c = e. The cir- 
cles represent the results of the numerical simulations. Error 
bars are much smaller than symbol sizes. All numerical val- 
ues were calculated from finite-size scaling fits of xn,c using 
functions xn,c = Xc{c) + aN~''. The inset shows the thresh- 
old backbone size be as a function of c: analytical results are 
given by the solid line, numerical data by the error bars. 



